Can we do better than frequency? A case study on extracting PP-verb collocations

نویسندگان

  • Brigitte Krenn
  • Stefan Evert
چکیده

We argue that lexical association measures (AMs) should be evaluated against a reference set of collocations manually extracted from the full candidate data, and that the notion of collocation needs to be precisely defined so that human collocativity judgments and experimental results are reproducible. We show that identification results achieved by particular AMs do not crucially depend on text type, but that some AMs are much better suited for identifying some classes of collocations than others.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Acquisition of Phraseological Units from Linguistically Interpreted Corpora a Case Study on German Pp-verb Collocations

In this paper, we show that accessibility of syntactic information eases collocation extraction from corpora, and supports identi cation of lexical and structural restrictions related to collocations. For collocation identi cation we use a corpus that is automatically annotated applying a part-of-speech tagger and a phrase chunker.

متن کامل

False Paraphrase Pairs in Spanish for Verbs and Verb+Noun Collocations

In this paper we have studied some pairs of paraphrases which are present in a linguistic resource called badele.3000, a data base that contains more than 3,600 high frequency Spanish nouns and 2,800 high frequency Spanish verbs. The restricted combinatory of both kinds of words means more that 23,000 collocations, which are expressed by Lexical Functions, a tool of Meaning-Text Theory. Through...

متن کامل

Issues in defining/extracting collocations in Japanese and Korean: Empirical implications for building a collocation database

Collocations in Japanese and Korean have been studied extensively based on statistical tools. The criteria for collocations in these languages, however, have not been fully established in the literature, and it is not obvious whether all statistically significant combinations of words could be regarded as collocations. In this article, we point out empirical problems in extracting collocations ...

متن کامل

Experiments on Candidate Data for Collocation Extraction

The paper describes ongoing work on the evaluation of methods for extracting collocation candidates from large text corpora. Our research is based on a German treebank corpus used as gold standard. Results are available for adjective+noun pairs, which proved to be a comparatively easy extraction task. We plan to extend the evaluation to other types of collocations (e.g., PP+verb pairs).

متن کامل

A Corpus-Based Study on High Frequency Verb Collocations in the Case of “HAVE”

[Abstract] On the basis of a corpus-driven approach, this research investigates high-frequency verb collocations in the case of have by Chinese non-English major learners. Results show that despite the most frequent use of the verb have, the learners make use of relatively low collocation types. The learners tend to simply overuse the words related to the topic or given by the writing direction...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001